DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 30

1
Shapley Idioms: Analysing BERT Sentence Embeddings for General Idiom Token Identification
In: Front Artif Intell (2022)
BASE
Show details
2
Semantic Relatedness and Taxonomic Word Embeddings ...
BASE
Show details
3
English WordNet Taxonomic Random Walk Pseudo-Corpora
In: Conference papers (2020)
BASE
Show details
4
Language related issues for machine translation between closely related south Slavic languages
Arcan, Mihael; Klubicka, Filip; Popovic, Maja. - : The COLING 2016 Organizing Committee, 2019
BASE
Show details
5
Synthetic, Yet Natural: Properties of WordNet Random Walk Corpora and the impact of rare words on embedding performance
In: Conference papers (2019)
BASE
Show details
6
Size Matters: The Impact of Training Size in Taxonomically-Enriched Word Embeddings
In: Articles (2019)
BASE
Show details
7
Training corpus hr500k 1.0
Ljubešić, Nikola; Agić, Željko; Klubička, Filip. - : Jožef Stefan Institute, 2018
BASE
Show details
8
Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian ...
BASE
Show details
9
Is it worth it? Budget-related evaluation metrics for model selection ...
BASE
Show details
10
Quantitative Fine-grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian
In: Articles (2018)
BASE
Show details
11
Is it worth it? Budget-related evaluation metrics for model selection
In: Conference papers (2018)
Abstract: Projects that set out to create a linguistic resource often do so by using a machine learning model that pre-annotates or filters the content that goes through to a human annotator, before going into the final version of the resource. However, available budgets are often limited, and the amount of data that is available exceeds the amount of annotation that can be done. Thus, in order to optimize the benefit from the invested human work, we argue that the decision on which predictive model one should employ depends not only on generalized evaluation metrics, such as accuracy and F-score, but also on the gain metric. The rationale is that, the model with the highest F-score may not necessarily have the best separation and sequencing of predicted classes, thus leading to the investment of more time and/or money on annotating false positives, yielding zero improvement of the linguistic resource. We exemplify our point with a case study, using real data from a task of building a verb-noun idiom dictionary. We show that in our scenario, given the choice of three systems with varying F-scores, the system with the highest F-score does not yield the highest profits. In other words, we show that the cost-benefit trade off can be more favorable if a system with a lower F-score is employed.
Keyword: budget; Computational Engineering; Digital Humanities; F-score; gain; idiom dictionary; idiom identification; linguistic resource creation; model evaluation; Other Computer Engineering
URL: https://arrow.tudublin.ie/cgi/viewcontent.cgi?article=1234&context=scschcomcon
https://arrow.tudublin.ie/scschcomcon/227
BASE
Hide details
12
hr500k – A Reference Training Corpus of Croatian.
In: Conference papers (2018)
BASE
Show details
13
Croatian Twitter training corpus ReLDI-NormTag-hr 1.1
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
14
Serbian Twitter training corpus ReLDI-NormTag-sr 1.0
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
15
Croatian Twitter training corpus ReLDI-NormTag-hr 1.0
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
16
Serbian Twitter training corpus ReLDI-NormTag-sr 1.1
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
17
Fine-grained human evaluation of neural versus phrase-based machine translation ...
BASE
Show details
18
Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation
In: Prague Bulletin of Mathematical Linguistics , Vol 108, Iss 1, Pp 121-132 (2017) (2017)
BASE
Show details
19
Serbian-English parallel corpus srenWaC 1.0
Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio. - : Jožef Stefan Institute, 2016
BASE
Show details
20
Finnish-English parallel corpus fienWaC 1.0
Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio. - : Jožef Stefan Institute, 2016
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
30
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern